Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

نویسندگان

Alessandro Abate

Milan Ceska

Marta Z. Kwiatkowska

چکیده

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the state-space explosion problem by adaptive state-space aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the state-space reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Approximate Policy Iteration for Semi-Markov Control Revisited

The semi-Markov decision process can be solved via reinforcement learning without generating its transition model. We briefly review the existing algorithms based on approximate policy iteration (API) for solving this problem for discounted and average reward under the infinite horizon. API techniques have attracted significant interest in the literature recently. We first present and analyze a...

متن کامل

Approximate Policy Iteration for Markov Control Revisited

Q-Learning is based on value iteration and remains the most popular choice for solving Markov Decision Problems (MDPs) via reinforcement learning (RL), where the goal is to bypass the transition probabilities of the MDP. Approximate policy iteration (API) is another RL technique, not as widely used as Q-Learning, based on modified policy iteration. In this paper, we present and analyze an API a...

متن کامل

Approximate Policy Iteration with a Policy Language Bias

We explore approximate policy iteration (API), replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and...

متن کامل

Approximate Policy Iteration with a Policy Language Bias (draft)

متن کامل

Computing Exploration Policies via Closed-form Least-Squares Value Iteration

Optimal adaptive exploration involves sequentially selecting observations that minimize the uncertainty of state estimates. Due to the problem complexity, researchers settle for greedy adaptive strategies that are sub-optimal. In contrast, we model the problem as a belief-state Markov Decision Process and show how a non-greedy exploration policy can be computed using least-squares value iterati...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

نویسندگان

چکیده

منابع مشابه

Approximate Policy Iteration for Semi-Markov Control Revisited

Approximate Policy Iteration for Markov Control Revisited

Approximate Policy Iteration with a Policy Language Bias

Approximate Policy Iteration with a Policy Language Bias (draft)

Computing Exploration Policies via Closed-form Least-Squares Value Iteration

عنوان ژورنال:

اشتراک گذاری